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SUMMARY 

A  method  Is  presented  for  comparing  the  strength  of  agreement  of  a 
group  of  rankings  with  an  external  ordering  to  the  corresponding  measure 
of  concordance  within  the  group.  While  the  procedure  Is  not  model  depen¬ 
dent,  we  Illustrate  the  characteristics  of  Interest  using  an  existing  model 
for  a  non-null  distribution  for  a  population  of  rankings.  U-statlstlcs  and 
a  jackknife  with  adjusted  degrees  of  freedom  are  employed  to  set  approximate 
confidence  Intervals  on  the  contrast  between  the  two  measures  of  rank  order 
agreement. 
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1 .  INTRODUCTION 

The  general  problem  of  concordance  among  a  group  of  judges  as  to  the 
preference  ordering  of  a  set  of  k  objects  can  be  extended  from  the  classical 
problem  of  m  rankings  to  the  problem  of  detecting  agreement  between  the 
rankings  and  a  specified  predicted  ordering  of  the  objects  that  Is  given  by 
the  external  ranking^  =  (y-j .  • . .  .y^) ' .  Tests  for  the  problem  of  m  rankings 

were  proposed  by  Kendall  &  Babington  Smith  (1939)  and  Ehrenberg  (1952). 

Tests  for  agreement  between  the  judges  and  an  external  ranking  were  proposed 
by  Jonckheere  (1954),  Lyerly  (1952),  and  Page  (1963).  All  of  these  tests  are 
based  on  statistics  that  are  distribution- free  under  the  null  hypothesis  of 
random  rankings;  i.e.,  that  there  is  no  agreement  among  the  judges  in  the 
population.  However,  It  is  often  known  in  advance  that  there  is  some  con¬ 
cordance  among  the  judges.  The  question  of  Interest  then  becomes  one  of 
whether  the  judges  agree  with  the  predicted  ordering  of  the  objects.  This 
question  should  not  be  Interpreted  as  one  of  perfect  agreement.  In  other 
words,  the  issue  Is  not  whether  every  judge  elects  the  ranking  with  prob¬ 
ability  one  but  whether  the  consensus  ranking  has  a  strong  positive  rank 
correlation  with 

This  external  ranking  setting  can  also  be  viewed  as  a  special  case  of 
the  problem  of  two-group  concordance  where  the  second  population  assigns 
probability  one  to  the  ranking^.  Tests  for  two-group  concordance  have  been 
given  by  Schucany  &  Frawley  (1973),  Hollander  &  Sethuraman  (1978),  and 

•  »  t 

recently  by  Kraemer  (1981). 

2.  U-STATISTICS  FOR  INTERNAL  AND  EXTERNAL  AGREEMENT 

Quads,  In  a  1972  Technical  Report  at  the  Mathematical  Centre,  University 
of  Amsterdam,  uses  U-statlstlc  theory  to  examine  the  concordance  of  a  popula¬ 
tion  of  Judges  as  to  the  ordering  of  k  objects.  Let  ■  (X^,...,X^)', 
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1*1,..., n,  denote  the  rankings  obtained  from  a  sample  of  n  judges,  each  of 
whom  independently  rank  the  k  objects.  The  coefficient  of  rank  correlation 
between  and  Xj  will  be  denoted  by  R(X^,  Xj).  This  coefficient  may,  for 
example,  be  taken  to  be  the  Spearman  (1904)  or  Kendall  (1938)  rank  correla¬ 
tion  coefficient. 

Quade's  measure  of  concordance  Is  given  by 
p  *  E{R(Xr  Xj)},  1  t  j, 

whore  the  expectation  Is  with  respect  to  the  multinomial  probability  distribu¬ 
tion  over  the  population  of  k!  rankings.  This  measure  of  concordance  Is 
referred  to  as  the  Internal  rank  correlation,  and  p  =  0  under  the  null  hypothesis 
of  random  rankings.  Furthermore,  most  Investigators  Interpret  p  >  o  to  be 
concordance  among  the  judges. 

The  external  rank  correlation,  which  Is  a  measure  of  the  agreement  between 
the  judges  and  the  external  ranking,  will  be  defined  as 
Pl  *  E{R(Xry)}. 
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This  external  rank  correlation  Is  positive  If  there  Is  agreement  between  the 
judges  and  the  predicted  ordering. 

The  U-statlstlc  estimators  of  p  and  p-|  are 
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respectively.  The  tests  for  concordance  Introduced  by  Kendall  4  Bablngton 
Smith  and  Ehrenberg  are  based  on  7.  The  external  ranking  tests  due  to 
Jonckheere,  Lyerly,  and  Page  are  based  on  R^  and  test  the  null  hypothesis  of 
random  rankings  against  the  alternative  that  p^  >  0.  ■ 
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3.  COMPARISON  OF  INTFRNAL  AND  EXTERNAL  RANK  CORRELATION 

Although  p-|  >  0  Indicates  that  there  Is  some  agreement  between  the  popu¬ 
lation  of  judges  and  the  external  ranking,  there  are  situations  In  which  there 
are  marked  differences  between  the  consensus  of  the  judges  and  the  external 
ranking  even  though  is  positive.  To  illustrate  this,  consider  a  model 
Introduced  by  Mallows  (1957)  and  later  studied  by  Feigln  &  Cohen  (1978). 

For  simplicity  we  will  only  consider  rankings  of  the  objects  that  contain  no 
ties.  Let  Xg  be  a  fixed  vector  denoting  one  of  the  k!  possible  orderings  of 
the  objects,  and  let  d(j<Q,  x)  denote  a  distance  (in  a  rank  correlation  sense) 
between  the  orderings  Xq  and  _x.  A  model  which  assigns  equal  probability  to 
all  rankings,  _x,  with  the  same  value  of  dfx^,  x)  Is  then 

P(x)  *  C(9)ed(-*0'  *),  0  <  e  <  1. 

Consider  this  model  when  k*4,  Xq  *  (1 ,2,3,4)' ,  and  the  distance  measure, 
d(xQ,  x),  Is  taken  to  be  the  number  of  discordant  pairs  of  objects  between 
and  x.  Further,  restrict  attention  to  the  case  In  which  R( - , * )  Is  the  Spearman 
rank  correlation  coefficient  and  the  external  ranking  is  *  (1,4, 2, 3)'.  Table 
1  presents  some  values  of  p1  and  p  as  functions  of  0  for  this  example.  For 
the  specific  case  of  e  =  .2  the  value  p1  =  .326  is  certainly  large  enough  for 
the  Page  test  to  have  reasonably  good  power  at  a  moderate  sample  size.  How¬ 
ever,  the  expected  ranking  from  the  model  is  jj  a  (1.24,  2.06,  2.94,  3.76)', 
which  differs  from  jr  on  the  ordering  of  object  2  relative  to  objects  3  and  4. 


TABLE  1 

External  and  Internal  Rank  Correlation  Coefficients 
for  Mallows'  Model 


e 

P1 

P 

0 

.400 

1.000 

.1 

.363 

.865 

.2 

.326 

.709 

.3 

.287 

.547 

.4 

.246 

.394 

.5 

.202 

.263 

.6 

.158 

.158 

.7 

.114 

.083 

.8 

.073 

.034 

.9 

.035 

.008 

1.0 

0.000 

0.000 

Notice  in  this  situation  that  the  Internal  rank  correlation  is  p  *  .709, 
which  Is  larger  than  .  This  indicates  that  there  is  stronger  agreement  withi 
the  population  concerning  some  consensus  ranking  than  there  is  with  this 
particular  external  ranking.  This  clearly  implies  that  the  consensus  of  the 
judges  is  not  the  external  ranking.  If  the  external  ranking  had  been  chosen 
to  be  y.  *  (1 ,2,3,4)',  then  the  external  rank  correlation  would  be  b  .842, 
which  is  larger  than  p.  So  It  appears  that  a  comparison  of  Pl  and  p  should 
be  made  to  determine  "substantial"  agreement  with  the  external  ranking. 

Kraemer's  two-group  procedure  is  based  on  estimating  a  parameter  that 
involves  both  the  inter-  and  intra-group  concordance.  Using  this  approach 
In  the  external  ranking  setting  would  correspond  to  estimating 

1  P1 

0*  «  i  +  — L 

p  2  p+T  ’ 

where  p^  and  p  are  based  on  the  Spearman  R( • , * ) .  Extending  Kraemer's  two- 
group  definition  of  "complete  concordance"  would  require  that  p*  ■  1.  How- 
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ever,  this  can  only  occur  when  ^  *  y,  which  means  that  each  judge  in  the 
population  assigns  the  ranking  £  with  probability  one.  While  it  is  appealing 
to  relate  the  external  ranking  setting  to  the  two-group  problem,  the  condi¬ 
tion  that  p*  *  1  Is  too  stringent  to  be  used  as  a  definition  of  agreement. 

What  is  needed  is  a  definition  of  "substantial"  agreement  that  is 
stronger  than  >  0  but  not  as  stringent  as  p*  *  1.  Since  p-|  <  p  indicates 
that  the  consensus  is  not  the  external  ranking,  then  a  reasonable  definition 
of  substantial  agreement  would  be  that  p^  >  p.  This  indicates  that  the  external 
rank  correlation  is  substantial  relative  to  the  internal  rank  correlation. 

The  internal  and  external  rank  correlation  can  be  compared  by  examining 
the  parameter  pd  =  p^-p.  This  parameter  is  estimable  of  degree  two  with 
kernel 


*(Xr  Xj)  -  2{R(Xr  y.)  +  R(^,  y.))  -  R^,  Xj). 


The  U-statistic  estimator  of  pd  is 


01  n  *<ll’  k 

l<i<j<n  1  J 


)  *  Ri  •  R  . 


Since  R.  is  invariant  to  the  jackknife  procedure,  we  can  estimate  the 
a 

variance  of  using  the  jackknifes  obtaining  a  multiple  of  a  sample  variance. 


l  (V1  -  Rd)j 
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where 

Xj). 

jj»1 

Then  the  limiting  distribution  (as  n  -*•  -)  of  the  student! zed  U-statistic 

(!?d  -  pd  )/ad  is  standard  normal  under  mild  regularity  conditions;  see  Sen  (1960). 

So  (Rd  -  pd)/ad  can  be  used  for  approximate  tests  and  confidence  intervals. 

In  practice  the  sampling  distribution  of  (Rd  -  Pd)/ad  can  be  approximated 
by  the  Student-t  distribution.  Hlnkley  (1977)  proposed  a  degrees  of  freedom 


estimator  for  the  t  approximation  of  studentized  jackknife  estimators. 
Palachek  &  Schucany  (1981)  have  shown  that  this  procedure  improves  the 
coverage  when  estimating  p  using  confidence  intervals  based  on  $.  This 


method  is  also  useful  for  interval  estimation  of  p^. 
The  degrees  of  freedom  estimator  is  given  by 
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i(n-2>2:d 
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Thus  an  approximate  100(l-a)%  confidence  interval  for  p^  is  given  by 
Rd  "  ta/2(fd)od  <  pd  <  Rd  +  ^^d^d’ 


where  ta(v)  is  the  (l-a)th  quantile  of  the  Student-t  distribution  on  \>  degrees 
cf  freedom.  Some  Monte  Carlo  evidence  of  the  adequacy  of  the  approximate 
confidence  coefficients  in  the  closely  related  one-group  setting  may  be  found 
in  Palachek  4  Schucany  (1981). 

4.  EXAMPLE 


Consider  the  following  hypothetical  data  set.  Suppose  that  20  judges 
have  independently  ranked  5  objects,  leading  to  the  rankings  in  Table  2.  An 
investigator  is  interested  in  determining  if  the  population  consensus  of  the 
rankers  agrees  with  the  ordering  given  by  -  (1 ,2, 3, 4, 5) . 

The  Page  test  rejects  H^:  "random  rankings"  in  favor  of  an  alternative 
that  p^  >  0.  Moreover,  the  U-statistic  approach  (using  the  Spearman  rank 
correlation  coefficient)  yields  *  .585,  and  the  estimated  variance  of  R-j 
Is 

ST  “  jjrp  -  *V2  a  -00423. 

Using  the  t-approximatlon  on  n-1  *  19  degrees  of  freedom  (since  the  R(^,  y) 
are  Independent)  leads  to  an  approximate  95%  confidence  interval 


TABLE  2 
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Rankings  of  5  Objects  by  20  Judges 


Judges 

A 

B 

Objects 
•  C 

D 

E 

1 

1 

3 

2 

4 

5 

2 

1 

5 

2 

4 

3 

3 

1 

5 

2 

3 

4 

4 

1 

4 

2 

3 

5 

5 

1 

3 

2 

4 

5 

6 

1 

3 

2 

4 

5 

7 

1 

4 

3 

2 

5 

8 

3 

4 

2 

1 

5 

9 

1 

4 

2 

3 

5 

10 

1 

5 

2 

4 

3 

11 

1 

2 

3 

4 

5 

12 

1 

4 

2 

3 

5 

13 

1 

4 

2 

3 

5 

14 

2 

5 

1 

3 

4 

15 

1 

5 

2 

3 

4 

16 

1 

4 

3 

2 

5 

17 

1 

4 

2 

3 

5 

18 

1 

3 

2 

4 

5 

19 

1 

5 

4 

3 

2 

20 

1 

4 

2 

3 

5 

Totals 

23 

80 

44 

63 

90 

which  indicates  that  there  is  some  agreement  between  the  population  of  rankers 
and  the  predicted  ordering. 

The  average  internal  rank  correlation  in  this  example  is  found  to  be 
ft  ■  .72.  Following  the  Palachek  &  Schucany  approach,  the  jackknife  variance 

estimator  of  R  is  found  to  be  .00499,  and  the  estimated  degrees  of  freedom 
are  f^  *  8.21.  This  leads  to  an  approximate  95%  confidence  Interval 
.558  <  p  <  .822. 
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Using  the  Bonferroni  inequality  these  two  intervals  hold  with  an  approxi¬ 
mate  confidence  coefficient  of  .90.  However,  a  sharp  comparison  between  p1 
and  p  is  not  possible  due  to  the  overlap  of  the  two  intervals. 

This  problem  can  be  circumvented  by  estimating  pd-  The  U-statistic  obtained 
is  Rd  *  -.135,  and  the. jackknife  variance  estimator  is  od  =  .00339.  The 

estimated  degrees  of  freedom  are  fd  =  12.4,  which  leads  to  an  approximate 
95%  confidence  interval 

-.261  <  pd  <  -.009. 

This  interval  is  unambiguous  in  estimating  that  the  external  rank  correla¬ 
tion  is  not  substantial.  In  other  words,  the  hypothesis  that  pd  >  0  is  rejected 
at  the  .05  level  in  favor  of  an  alternative  that  pd  <  0.  Comparison  of  £  with 
the  average  ranks 

(1.15,  4.0,  2.2,  3.15,  4.5) 

shows  that  the  predicted  ordering  has  misplaced  object  B  relative  to  objects 
C  and  D. 

5.  CONCLUSIONS 

The  method  presented  here  contrasts  the  internal  and  external  rank  correla¬ 
tion.  This  allows  one  to  determine  whether  the  agreement  with  a  predicted  order 
is  "substantial"  in  light  of  the  strength  of  the  agreement  within  the  popula¬ 
tion. 

The  degrees  of  freedom  estimator  should  be  used  for  small  and  moderate 
sized  samples  to  avoid  undercoverage  of  confidence  intervals.  However,  this 
procedure  is  adaptive  in  that  the  estimated  degrees  of  freedom  are  sometimes 
larger  than  n-1,  leading  to  shorter,  more  precise  intervals. 

This  work  was  partially  supported  by  a  contract  with  the  Office  of  Naval 
Research  and  that  sponsorship  is  gratefully  acknowledged. 
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